dewaldjacobsfandomcom-20200214-history
Open Data Quality Management Framework
Open Data Quality Management Framework = Introduction Data quality – “Driving profit and Productivity” Ensuring accurate data is a joint responsibility between Business and IT. It is also not a once off exercise but an ongoing effort that will ensure an effective organization. Improving the quality of information, will reduce costs, improve productivity and increase profit. This data quality framework is intended to provide a common objective approach to the assessment of information and improvement of data and data quality. The approach is aligned with the business expectations for quality information and will be accomplished by using a range of methods and techniques that assess the quality of the data and then through the application of consistent embedded work practices. This will also identify the data quality priorities and encourage continuous improvement in data quality. The following points are vital to the successful DQ programme: *“Do we need the data? Is the data useful?” seen from the Business context. *DQ is a continual programme and not just a project. *DQ is everybody’s responsibility. *Data should be seen as the most important asset. Data Management Functions The purpose of this document is to act as a framework to improve data quality through the components (phases) and processes. The document will also explain how these components are used within the data quality assessment and management process and serve as the framework for the design of the enterprise wide Data quality policies and procedures as well as successful implementation and execution. The Framework The framework consists of five focus areas. They are Scope, Objectives, Security, Governance and the cycles within each phase of the framework. The framework execution starts with defining the enterprise wide objectives that needs to be reached. Stakeholder input is vital in the completion of the objectives. They include business managers, users, data owners, strategic goals and not always focusing on the final objective which is quality data as defined by KPI’s. Once the objectives have been defined, the scope of the DQ programme gets defined in accordance with the objectives. Security of data forms a fundamental part of the framework to ensure validity and integrity. Security should be maintained and enforced throughout the programme. Governance ensures that the programme is executed in line with corporate governance including IT-, operational-, corporate- and data governance. The framework phases are broken down into five (5) cycles. *Who - The identified team roles that will be responsible for execution of the phase cycles? Focusing on the role and not on the individual. *What - Work that needs to be done with defined deliverables. What also applies to data that will be affected and the output documentation that is expected during the phase? *How - The processes and methodology followed to get the requirements executed. *When - When the phase starts (project plan) and when specific deliverables are expected. During the extraction and load phases, the when will determine the time when the extraction and loading are done. *Where - Where the programme will be located within an organization and where the phase will be managed and executed from. Within this cycle, where data is located also needs to be specified. Each one of the focus areas are discussed below in detail. Scope Organizations have various business units within the organization and each one of these BU’s have their own collection of needs and data quality issues residing in systems, database, documentation and processes. Although data quality issues vary between the BU’s, it can all be resolved by applying an enterprise wide accepted framework that addresses unique and general DQ problems. Objectives The current view of data quality is to manage data for purely BI reporting generated from a data warehouse and to get the data into the data warehouse with the ETL approach taken. Data are extracted from system databases into a staging area where Transformation of the data takes place based on governance policies, data and business rules. When successfully transformed and data quality standards met, the data is Loaded into the data warehouse wherefrom various reports are generated, e.g. ASR. A carefully planned framework and subsequent policies and procedures will alleviate any risk of failure. Developing processes and technology that combines communication among different business functions will help to ensure high quality of information It is an important objective to change the way they view data from purely business reporting to knowledge through information management. Although attention needs to be given to the quality of data during its transformation into the data warehouse and data capture on master database from where this is only a small building block in establishing an information power house focusing on Data quality. The main objective is broken down into short- and long term objectives as follows: Short Term Objectives Short Term objectives are to address current data quality needs and assist in presenting to business the advantages of the Data Quality management processes. Finalize Scope. To formalize the DQ scope, the DQ Panel or steering committee should identify the reasons why the DQ management programme is required and what are the expected results from the programme. To achieve this, DQ management and data quality issues should be analyzed and current state can be used as a frame of reference to assist in the gap analysis to bring this scope into alignment with strategic requirements. This can only be achieved when the necessary stakeholders participate in workshops initiated by the DQ team to determine the full extent of the problems and define the scope. The scope document known as the ‘as-is’ will clarify why the DQ programme is needed, the results expected, implications of not continuing with the programme and any future justification for the implementation of the programme. To design the scope document, stakeholder’s participation is crucial to establish current state, expectations and challenges that the programme faces. Only then can justification be given for expenditure and any savings or revenue generation advantages of the programme. Although the components of the framework will highlight many of the aspects of the scope, further scope implications will be identified as the programme gets implemented and executed Stakeholder support. Data Quality Management requires a substantial investment of money, time, and resources. Therefore it is essential that short term initiatives yield high return on investments that meet stakeholder’s expectations. As soon as the stakeholders can see the advantage of investing in DQ management, the long term objectives will have the support necessary. Therefore short term objectives, or small wins, is crucial for the success of Data quality. Define Data Owners Identifying data owners enable communication channels with the right people. The data owners are responsible for data within their business area. Data enable business, therefore business should own data, make decisions regarding their data in terms of security, maintenance, quality, and value and be active participants in the DQ programme. Define Data Quality Office It is important to identify a point of reference for the DQ programme. People need to know where to go regarding any data quality related issues, data governance. The quality office will govern projects to align to data quality standards. Define team members and roles Data quality must be supported by everyone within the organization, not just IT. Involving experts from business is an essential step necessary for success. Users and business are the best source of information regarding context of data and IT for creating solutions to the system and data problems. Each team member has a clearly defined role for making the initiative a success and must be accountable for their part. Identify the right technologies When the right technology is built around a solid data quality framework, the technologies will accelerate the discovery, design, development and deployment of the data quality programme across projects, departments and the enterprise. The right technology forms part of a solution for the problems around the quality of data and standards for use of data. Current Data Quality Needs. Quality risk assessment demand immediate action to show immediate results. Although short term wins indicate successful processes, data quality should not be seen as delivering quick results but rather continual improvement and sustainable quality. This said; the need for BI reporting is clear and present. The ETL process of getting the data into the data warehouse for BI reporting has indicated that the current state of data is undesirable and need data quality guidelines (policies and standards) to present business with accurate reporting (e.g. ASR). This data cleansing procedures, data transformation rules and business requirements needs to be documented and applied to current and future data transformation processes. Many of the short term objectives will impact the long term objectives and lessons learned will assist in reusable processes and refining the quality of already defined standards. Long Term Objectives To stop poor data quality and ensure long term sustainability, the quality of information need to be managed, maintained, and monitored regularly. To accomplish these objectives, organizations need an data quality solution that works with all types of data, business processes, systems and one that profiles, discovers, improves, monitors and governs data. Issues/Risks Although risks will be determined during execution of each component within the framework, certain risks and issues can be determined before and covering the total framework. It is thus very important to compile a risk management plan on the same level as scope and objectives to list known issues and risks and to address them with the necessary controls. Typical risks include lack of resources, budgeting, technology, knowledge, and ever changing environments and applications, etc. Security A security policy and procedures are designed to mitigate risks such as operational, strategic and compliance to regulations as well as guide the allocation of CRUD roles to the DQ team members and users of data within systems. The scope includes the protection of the confidentiality, integrity and availability of information and granted access to resources, systems and corporate information. Security objects: *Data security policy. *Data security procedures. *Data backup and retrieval. *Data retention and data archive policies *Data protection policies *Disaster recovery plan. *Data privacy. Governance Although the creation of a data governance document is expected to be completed during the design phase of the framework, this governance component addresses total company wide governance policies and procedure adherence. In essence, the data governance encapsulates the entire range of policies, procedures, standards generated within the DQ programme. With the framework defining the scope of work and execution of the DQ programme phases. Framework Components Executive Sponsorship/Stakeholder Support To accomplish the long term data quality programme, the first milestone is to get stakeholders endorsement. Due to the large investment made in people, infrastructure, money and time, the short term objectives must be successful with no failures; indicating the quality of data based on DQ metrics and linking the results to lost opportunities and revenue. Stakeholder participation and support does not stop with their initial approval but with continual participation and progress communication. Without stakeholders’ approval, the programme will fail. The business case needs to be developed based on assessing the negative impacts of poor data quality across a number of categories: decreased revenues, increased costs, increased risk, decreased confidence, poor customer satisfaction. It is not just enough to highlight the negative side of poor data quality to convince stakeholders, but also to highlight the positive impact of good data quality on “return of investment”. Design and Planning During this cycle, the actual information architecture and analysis starts on the identified source systems. The entire DQ team together with the identified data owners will work together to complete the policies, procedures, repositories, KPI’s, test cases, dashboards, and any other artifacts generated. (See 7. Derived Design and Planning Artifacts). These artifacts needs time and planning to get solutions that are accepted by all stakeholders. During this cycle, the extend of the data quality implications will become clear and fixed time schedules can be compiled and implemented. This cycle will present a clear roadmap of the challenges faced with the integration of data and information necessary to develop processes to address the challenges proactively. Extract This phase involves the extraction of data from one or more data sources into the staging area that will subsequently be loaded into the EDW or as clean data, back into the data source or production environment during the Load and Implement phase. What data to extract will be determined by the business requirements from the previous Design and Planning phase. Extracted data can be seen as a data view based on business requirements and framework scope and objectives. Extraction processes needs to be followed precisely as data change within seconds and therefore data is related to time of extraction with the necessary controls in place to manage the difference between incremental and full extraction. A smooth extraction process can be accomplished with the staging area predefined and implemented and with the necessary technology requirements and security protocols in place. During and after the extraction of data, auditing needs to be done on the data to make sure all relevant data were received using test cases and data profiling. Evaluate and Analyze It is at this phase when the team must take on the actual data, making use of technologies available to facilitate data analysis and understand the data mappings and Data Quality Dimensions relationship to business processes Data profiling to identify potential and current problems within fields e.g. contact data, address data, customer codes and all information that should adhere to the data policies and standards. Test cases should be developed based on analysis findings and business processes to validate corrective actions and to extract further anomalies. Test cases also support any auditing processes to estimate the effectiveness of the analysis. Transform Once the data is available in the Staging Area the data gets transformed from bad quality data into good quality data based on standards and policies resulting from the previous phase focusing on quality dimensions and business data requirements. Throughout the transformation process, changes will be tracked and audits done to make sure transformation adhere to governance. Load and Implement After the successful acceptance done from the auditing of data transformation, the cleaned data can be either loaded into the data warehouse for example BI purposes, or implemented back into the data source/s. The implementation or loading of transformed data back into the original data sources is a very risky process and should be well planned before action is taken. There are a huge amount of risk areas that should be taken into account from triggers to business rules. Most important during this phase is the auditing and monitoring of any loading to make sure that the correct data was loaded and adhere to the data governance. Monitor and Verification Being able to measure, monitor and verify data quality throughout the DQ programme lifecycle is an essential in the active management of on-going data quality improvement and data governance policies. Organizations need to formalize targets, measuring conformance and communicating concrete data quality metrics to management and data owners. DQ metrics provide a unified view of data and data quality, and can also provide the basis for reporting. DQ monitoring and verification based on a well-defined set of metrics provides important knowledge about the value of the data and the framework in use, and empowers business with the ability to determine how the data can best be used to meet their own business needs. The monitor of data has got two aspects linked to it. First the monitoring of the existing DQ programme progress and secondly the monitoring of current data behavior based on existing metrics designed during the Design and Planning phase. This coincides with the verification of effort through auditing processes done on the existing programme of ETL. Monitoring results can be presented in various forms from online dashboards to printed reports. Communication Communication consists of informing the users of progress through continuous employee feedback, the training of data quality standards and the creation of a data quality culture. Throughout the data quality programme it is essential to communicate the progress made and policies and standards implemented to the employees. Various processes will be put in place in systems and all of these process impacts need to be relayed to the user. It is essential to give training to system users on a regular basis as new processes get implemented and the impact of those changes on the behavior of systems. Introduction and/or induction courses are ideal places to educate new employees about the data quality standards and policies. Although through monitoring of employees will indicate users that need extra training, regular informative material needs to be distributed to all employees to reaffirm the need to be attentive of the quality of data. The best way to make sure that data is of a high quality is to establish quality as a culture. No amount of training or system processes will ever substitute quality as a culture. Therefore continual campaigns need to be running to keep the users aware of the advantage of good quality. Improve and Optimize As systems change and new business needs are implemented, data changes and evolves. This require continuously assess and monitor the quality of the data flowing through the various systems with the data quality programme having no end date but rather end of cycles. At the end of each of these cycles it is important to evaluate its effectiveness against set metrics and quality KPI’s. Any weaknesses that are identified are prioritized against business requirements and the programme improved and optimized. Executing this phase will rationalize the programme into a more productive process that needs less resources with better processes monitoring and limiting failures. Roles Composition Derived Design and Planning Artifacts Data Quality Dimensions Data Quality Severity Criteria Rating 0 - None 1 - Low 2 - Minor 3 - Medium 4 - High 5 - Critical Example: Valid Address (not actual rating)